|
Fuzzy retrieval techniques are based on the Extended Boolean model and the Fuzzy set theory. There are two classical fuzzy retrieval models: Mixed Min and Max (MMM) and the Paice model. Both models do not provide a way of evaluating query weights, however this is considered by the P-norms algorithm. ==Mixed Min and Max model (MMM)== In fuzzy-set theory, an element has a varying degree of membership, say ''dA'', to a given set ''A'' instead of the traditional membership choice (is an element/is not an element). In MMM each index term has a fuzzy set associated with it. A document's weight with respect to an index term ''A'' is considered to be the degree of membership of the document in the fuzzy set associated with ''A''. The degree of membership for union and intersection are defined as follows in Fuzzy set theory: : : According to this, documents that should be retrieved for a query of the form ''A or B'', should be in the fuzzy set associated with the union of the two sets ''A'' and ''B''. Similarly, the documents that should be retrieved for a query of the form ''A and B'', should be in the fuzzy set associated with the intersection of the two sets. Hence, it is possible to define the similarity of a document to the ''or'' query to be ''max(dA, dB)'' and the similarity of the document to the ''and'' query to be ''min(dA, dB)''. The MMM model tries to soften the Boolean operators by considering the query-document similarity to be a linear combination of the ''min'' and ''max'' document weights. Given a document ''D'' with index-term weights ''dA1, dA2, ..., dAn'' for terms ''A1, A2, ..., An'', and the queries: ''Qor = (A1 or A2 or ... or An)'' ''Qand = (A1 and A2 and ... and An)'' the query-document similarity in the MMM model is computed as follows: ''SlM(Qor, D) = Cor1 * max(dA1, dA2, ..., dAn) + Cor2 * min(dA1, dA2, ..., dAn)'' ''SlM(Qand, D) = Cand1 * min(dA1, dA2, ..., dAn) + Cand2 * max(dA1, dA2 ..., dAn)'' where ''Cor1, Cor2'' are "softness" coefficients for the ''or'' operator, and ''Cand1, Cand2'' are softness coefficients for the ''and'' operator. Since we would like to give the maximum of the document weights more importance while considering an ''or'' query and the minimum more importance while considering an ''and'' query, generally we have ''Cor1 > Cor2 and Cand1 > Cand2''. For simplicity it is generally assumed that ''Cor1 = 1 - Cor2'' and ''Cand1 = 1 - Cand2''. Lee and Fox experiments indicate that the best performance usually occurs with ''Cand1'' in the range (0.8 ) and with ''Cor1'' > 0.2. In general, the computational cost of MMM is low, and retrieval effectiveness is much better than with the Standard Boolean model. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Fuzzy retrieval」の詳細全文を読む スポンサード リンク
|